Mean Field Approach for a Statistical Mechanical Model of Proteins 



Pierpaolo Bruscolini 
Dipartimento di Fisica & INFM, Politecnico di Torino, 
C.so Duca degli Abruzzi 24, 1-10129 Torino, 



m 
o 
o 

(N 



o 

CO 

-4-* 



o 
o 



> 

00 
00 
(N 

in 
o 
m 
o 

-I— > 



i 

-a 
c 

o 
o 



X 
S3 



Fabio Cecconi 

Dipartimento di Fisica Universita "La Sapienza" & INFM Unita di Romal, P.le A. Moro 2, 1-00185 Roma, Italy 

We study the thermodynamical properties of a topology-based model proposed by Galzitskaya 
and Finkelstein for the description of protein folding. We devise and test three different mean- 
field approaches for the model, that simplify the treatment without spoiling the description. The 
validity of the model and its mean-field approximations is checked by applying them to the /3-hairpin 
fragment of the immunoglobulin-binding protein (GB1) and making a comparison with available 
experimental data and simulation results. Our results indicate that this model is a rather simple 
and reasonably good tool for interpreting folding experimental data, provided the parameters of the 
model are carefully chosen. The mean-field approaches substantially recover all the relevant exact 
results and represent reliable alternatives to the Monte Carlo simulations. 



I. INTRODUCTION 



The free-energy landscape of protein molecules rep- 
resents the key-information for understanding processes 
of biomolecular self-organization such as foldingiS* 3 ^ 
The free-energy landscape, indeed, determines all ob- 
servable properties of the folding process, ranging from 
protein stability to folding rates^&L&ii Unfortunately, 
for real proteins, sophisticated all-atom computational 
methods fail to characterize the free-energy surface, since 
they are currently limited to explore only few stages 
of the folding process. As an alternative, one can ar- 
gue that, taking into account all the complex details 
of chemical interactions is not necessary to understand 
how proteins fold into their native state. Rather, ele- 
mentary models incorporating the fundamental physics 
of folding, while still leaving the calculation and sim- 
ulations simple, can reproduce the general features of 
the free-energy landscape and explain a number of ex- 
perimental results. This attitude, typical of a statis- 
tical mechanics approach, agrees with the widely ac- 
cepted view that "a surprising simplicity underlies fold- 
ing" (Baker—). In fact several experimentaliiii2*iiii4 and 
theoretical studiesi^iSiiLiS indicate the topology of pro- 
tein native state as a determinant factor of folding. As ex- 
amples, one can mention the fact that even heavy changes 
in the sequence that preserve the native state, have a lit- 
tle effect on the folding rates— i^S Moreover, the latter are 
found to correlate to the average contact order j2i which 
is a topological property of native state. Finally, proteins 
with similar native state but low sequence similarity of- 
ten have similar transition state ensembles— 

Within this context, elementary models 22,23 - 24 - 25,26 
which correctly embody the native state topology and 
interactions, are believed to be useful in describing the 
energy landscape of real proteins. In this paper, we study 
one of such topology-based models proposed by Galzit- 
skaya and Finkelstein (GF), 27 which was developed to 
identify the folding nucleus and the transition state con- 



figurations of proteins. The model employs a free-energy 
function with a reasonable formulation of the conforma- 
tional entropy, which is certainly the most difficult con- 
tribution to describe. The energetic term, instead, takes 
into account only native state attractive interactions. In 
the original paper j2I the model was combined with a dy- 
namic programming algorithm to search for transition 
states of various proteins. To reduce the computational 
cost of the search, two kinds of approximations were in- 
troduced: the protein was regarded as made up of "chain 
links" of 2-4 residues, that fold/unfold together; besides, 
only configurations with up to three stretches of contigu- 
ous native residues were considered in the search ( "triple- 
sequence approximation" ) . As shown in Ref . |2^, the ef- 
fect of such assumptions is a drastic entropy reduction 
of the unfolded state and possibly of the transition state. 
This produces free energy profiles very different from the 
true ones, thus spoiling the evaluation of ^-values. 

Here, we apply the model in a more general statistical 
mechanical philosophy: namely, we develop three differ- 
ent mean-field approaches of increasing complexity, and 
compare their prediction with the exact results, obtained 
by exhaustive enumeration of all the configurations, in 
the case of a 16-residues-long peptide (C-terminal 41-56 
fragment of the streptococcal protein G-B1)SS which is 
known to fold, in isolation, to a /3-hairpin structured 
Our main goal here is to test the model against exper- 
imental findings and to test the mean-field predictions 
against the exact results. In the future we will use this 
knowledge to apply the appropriate mean-field approach 
to the case of real proteins, for which exhaustive enumer- 
ation is unfeasible. 

The paper is organized as follows. In the next sec- 
tion, we present and describe the main features of the 
GF model. In section II, we introduce and discuss three 
mean-field approximations: the usual scheme, and two 
other approaches stemming from the knowledge of the 
exact solution for the Mufioz-Eaton model. 28 In section 
III, we apply the model and its mean-field approxima- 
tions to study the folding transition of the /3-hairpin and 



discuss our results. 



II. DESCRIPTION OF 
GALZITSKAYA-FINKELSTEIN MODEL 

The GF model assumes a simple description of the 
polypeptide chain, where residues can stay only in an 
ordered (native) or disordered (non-native) state. Then, 
each micro-state of a protein with L residues is encoded 
in a sequence of L binary variables s = {si, s%, Sl}, 
(s, = {0, 1}). When Si = 1 (s, = 0) the i-th residue is in 
its native (non-native) conformation. When all variables 
take the value 1 the protein is correctly folded, whereas 
the random coil corresponds to all 0's. Since each residue 
can be in one of the two states, ordered or disordered, the 
free energy landscape consists of 2 L configurations only. 
This drastic reduction of the number of available config- 
urations represents, of course, a restrictive feature of the 
model, however, follows the same line of the well known 
Zimm-Bragg models widely employed to describe the 
helix to coil transition in heteropolymers. 

The effective Hamiltonian (indeed, a free-energy func- 
tion) is 



H(s) 



A, 



where S(s) is given by: 



5(s) = R 



TS(s) . 



<Szoop( s ) 



(1) 



(2) 



R is the gas constant and T the absolute temperature. 
The first term in Eq. (JTJ is the energy associated to na- 
tive contact formation. Non native interactions are ne- 
glected: this assumption, that can be just tested a pos- 
teriori, is expected to hold if, during the folding process, 
the progress along the reaction coordinate is well depicted 
on the basis of the native contacts (that is, the reaction 
coordinate(s) must be related to just the native contacts). 
Moreover, such progress must be slow with respect to 
all other motions, so that all non-native interaction can 
be "averaged-out" when considering the folding process^ 
Ay denotes the element i,j of the contact matrix, whose 
entries are the number of heavy-atom contacts between 
residues i and j in the native state. Here we consider 
two amino-acids in contact, when there are at least two 
heavy atoms (one from aminoacids i and one from j) 
separated by a distance less than 5A. The matrix A em- 
bodies the geometrical properties of the protein. Notice 
that, in the spirit of considering the geometry more rele- 
vant than the sequence details, every (heavy) atom-atom 
contact is treated on equal footing: the chemical nature 
of the atoms is ignored, together with a correct account 
for the different kind of interactions. 

The second term in Q is the conformational entropy 
associated to the presence of unfolded regions along the 
chain, and vanishes in the native state. 



More precisely the first term in Eq. J5J is a sort of 
"internal" entropy of the residues, that can be attributed 
to the ordering of the main and side-chains' degrees of 
freedom upon moving from the coil to the native state. 
Indeed, qR represents the entropic difference between the 
coil and the native state of a single residue, as can be 
noticed by considering that in the fully unfolded state 
the first and last term vanish, and the entropy is given 
by qLR. 

The quantity RSi oop in Eq. @, instead, is the entropy 
pertaining to the disordered closed loops protruding from 
the globular native stated it reads: 

j'-i 

i<j k=i+l 



According to Ref. |23, we take 



32 



J ( r ij) = -o ln l i_ il _ 7 



3 ri 



AAa\i-j\ 



(4) 



In this context a disordered loop is described by a strand 
of all "0"s between two "l"s: for instance the configu- 
ration 11000000111100011 contains two loops involving 
6 and 3 residues respectively. The product in expres- 
sion (PJ warrants that only uninterrupted sequences of 
"0" can contribute to the loop entropy. The configuration 
of a disordered loop going from residues + to (j — 1), 
with i and j in their native positions, is assimilated to a 
gaussian chain of beads (C a atoms) with end-to-end dis- 
tance Tij , the latter being the distance between C a atoms 
of residues i and j in the native state. The parameters 
a = 3.8 A and A = 20 A are the average distance of 
consecutive C Q 's along the chain and persistence length 
respectively. Other forms for Si oop could also be used 
(see, e.g. Ref. 12^) : yet, here we are interested in evaluat- 
ing the original GF model and devising good mean-field 
approximations to it, and we will not discuss this sub- 
ject any further. The interested reader may refer to the 
original article o 27 i 31 for a derivation of Eq. (0}. 



III. MEAN FIELD APPROACHES TO THE GF 
MODEL 

Mean field approach (MFA) is certainly the first at- 
tempt to investigate the thermodynamical properties of 
complex systems, because it provides a qualitative pic- 
ture of the phase diagram that in many cases is only 
partially modified by more accurate refinement of the 
theory. In its variational formulation, MFA, for a system 
with Hamiltonian H and corresponding free-energy F, 
starts from the Bogoliubov-Feynman inequality 



F < F + (H ~ H ) 



(5) 



where Hq is a solvable trial Hamiltonian Fo is the corre- 
sponding free-energy, both depending on free parameters 



2 



x = {x\ ■ ■ ■ xl\ (variational parameters). Such parame- 
ters have to be chosen to minimize the second member 
of JSJ to get the minimal upper bound of F and accord- 
ingly its better approximation. This method defines a 
variational free-energy 



F var — Fq + (H — Hq)o ■ 



(6) 



whose minimization leads to the self consistent equations 
that in their general form read 



dHo 

dxi 



(H - H )o -({H-H 



dH 

dxi 



0, (7) 



with I — 1, . . . , L. We implement different versions of the 
MFA for the GF model that differ each from the other 
by the choice of the trial Hamiltonian. 



A. Standard Mean Field Approach (MFA1) 

To implement the standard MFA for the GF model, 
we regard the free energy function Q as an effective 
Hamiltonian. 

The trial Hamiltonian we choose, corresponds to ap- 
plying an inhomogeneous external held with strengths 
x = {xi, ...,xl} along the chain 



L 

H = ^2 XiSi , 
»=1 



(8) 



with Xi to be determined by minimizing the variational 
free-energjiS 



L 



F„ or (x,T) = > fo (xi , T) + (H-H ) , (9) 



=i 



where J^. /o(£i, T) is the free energy associated to Ho, 



(10) 



fo{xi,T) = -— In <J 1 + exp(-/3xi) 



Thermal averages, performed through the Hamiltonian 
H , factorize (siSj...s k )o = (si)o(sj)o—(sk)o- The ap- 
proximate average site "magnetization" m, = (si)o de- 
pends only on the field X{, and is given by 



1 



m. 



dFp = 

dxi 1 -I- e~xjp(/3xi 



(11) 



Instead of working with external fields x^s, it is more in- 
tuitive to use the corresponding "magnetizations" m^'s, 
writing F var as a function of the mi's. Due to the choice 
of Hq, Eq. (J8J, and to the expression evaluating 
the thermal average (H)o amounts to replacing, in the 
Hamiltonian Eq. |JT]|. each variable Si by its thermal av- 
erage mi Ullj l- In the end we get: 

L 

F var (m,T) =e^A ij m i m j -TS(in)+Rr^2g(mi), 

ij i—l 

(12) 



where g(u) = uln(u) + (1 — u)ln(l — u) and S(m) is 
obtained from Eq. |J2J by substituting Sj — > m,. The 
last term corresponds to Fo — (-ffo)o m 

Eq. ©: it is 

the entropy associated to the system with Hamiltonian 
Hq and is the typical term that stems from this kind of 
MFA. 33 Carrying out the minimization of function l|12|) 
with respect to m leads to self-consistent equations: 



/(mi) = Ajj-mj - RT^q - 



drrii 



(13) 



Equations (|13f) can be solved numerically by iteration 
and provide the optimal values of the magnetizations 
that we denote by m*. Once the set of solutions m* 
is available, we can compute the variational free-energy 
F var (m*) that represents the better estimation of the 
system free-energy F. 

In a mean-field approach, the (connected) correlation 
function between residues i and j, 

c tj( T ) = (sjSj) ~ ( s i)( s j) > (14) 
can be recovered through a differentiation of F var (m, T): 



dF„ 



dmidnij J m » 



(15) 



where the subscript indicates that the derivative is eval- 
uated on the solutions m*. Explicitating each term of 
F var we obtain the expression 



c« 1 (T) = 



m*(l — m*) 13 \ drriidm 



d 2 Si oop (m) 



J / m 



The correlation function matrix is given by the inversion 
of above matrix. 



B. Second Mean Field Approach (MFA2) 

The quality of the MFA improves when we make a 
less naive choice for Ho. One of the possible Ho is sug- 
gested by the Muhoz-Eaton mode l 25 i 34 i 35 that was proven 
to be fully solvable in Ref. |2^. In fact, even if the two 
models are not equivalent, there is an interesting formal 
relationship between that model and the present one. 
In the Muhoz-Eaton model, the (effective) energy of a 
configuration results from the contributions coming from 
the stretches of contiguous native residues it presents, 
plus an entropic contribution from each of the non-native 
residues! 28 ' 34 

Here the effective energy Eq. Q boils down to the con- 
tributions of stretches of contiguous non-native residues 
(the loops), plus the sum of pairwise interactions of na- 
tive residues. This latter term makes the model harder 
to solve than Muhoz-Eaton's one. If we neglect this in- 
teraction, and replace it with a residue-dependent contri- 
bution, the model can be mapped on the Muhoz-Eaton 



3 



model. Indeed, a trial Hamiltonian of the kind: 

L 



H {x)=J2x iSi -TS(s); 



(17) 



i=l 



with ^({si}) given by Eqs. can be recast as Hq = 

C + Hme upon the substitution Sj — > (1 — s,), where 
C = ^2 Xi is a constant, and 



roughly approximated. This new version aims to bet- 
ter incorporate the energy contributions and we shall see 
that results are in excellent agreement with the exact so- 
lution obtained by exact enumeration on the /3-hairpin. 
We consider the set of configurations of the proteins with 
M native residues (M = 0, ...,L). We then take as the 
trial Hamiltonian 



H ME = ^2 \ u v II Sk ) + E 



(18) 



k—i 



with 



RT[Ji-l.j + l — Ji.j + l — Ji-l,j + (1 — ^ 

(19) 

RT(q + Ji_ hi+1 )-Xi, (20) 



(here J itj = J(r#) of Eq. |gj); J , 4 = Ji,L+i = 0). 
Now the trial Hamiltonians reads formally as the Munoz- 
Eaton Hamiltonian: see Eq. (1) of Ref. 28|, where the 
symbol m.j was used instead of Si. 

Hence, we choose Eq. I|17f) as the trial Hamiltonian, 
and write down the mean field equations Eq. Q: 



si 



1=1 



(21) 



for I = 1, . . . , L, These equations involve the functions 



Ci = (si)o 
C t ,j,i = (siSjSi) 



(22a) 
(22b) 
(22c) 



where averages are evaluated by the same transfer matrix 
technique as in Ref. |2^ 

Using the fact that CVM is exact for the Munoz-Eaton 
Model, it can also be proven that the three-point func- 
tions Cij^i can be written as a function of the two-point 
ones: Cy,; = CijCj^/Cj, for i < j < I 36 . This greatly 
reduces the computational cost of minimizing the varia- 
tional free energy and makes the approach particularly 
suitable for long polypeptide chains. 

Correlations Cy could still be evaluated as in Eq. (|15|l , 
but now the dependence of F var upon irij cannot be 
worked out explicitly, and the derivatives must be evalu- 
ated resorting to the dependence on the fields Xj : namely 
dFyar/dm.i = J2j(9xj/dmi)(dF var /dxj). However, this 
entails to evaluate the four-point averages (siSjSkSi}o, 
with a consequent relevant computational cost, for this 
reason, we will not pursue this strategy in the following. 



L 

E 

M=0 



<5(A/-£ jSl )#o M) (x) 



where 5(*) is the Kronecker delta, and H^ M%> is the Hamil- 
tonian restricted to the configurations with M natives: 



(23) 



ffo (M) M 



M- 1 



TS{s) 



(24) 



with ii — (1/2) £ i,j^-i,j- Each residue i, in a 

generic configuration with M native residues, feels an 
interaction ii which it would feel in the native state, 
weakened by a factor (M — l)/(£ — 1) (accounting for 
the fact that not all the residues are native), times the 
external field Xi, to be fixed by the mean field procedure. 

This scheme is useful for taking correlations into ac- 
count in a better way than in the usual MFA, so to gain 
some insight on the parts of the chain that fold first and 
to investigate folding pathways. In this framework the 
partition function is: 



L L (M) 

M=o M=0 { Si =0,l} 



where the symbol (M) above the sum indicates that 
the sum is restricted to configurations with M native 
residues. The mean field equations Q reads 



C. Third Mean Field Approach (MFA3) 

In the previous MFA version, the entropic term was 
treated exactly while the energy contribution was very 



si 



E< 



0. 



i=l 



(26) 



4 



for each I, where 



(M) 



^: = E 



M- 



c=E 



(M-1) 2 C ( M ) 



' (£-1) 

(A/) 



(M) 



PS° = ^ E ^exp(-/^ M) ) 



<5 = ^ E Wi-ieKpC-^) 



(27a) 
(27b) 
(27c) 
(27d) 
(27e) 
(27f) 



{Si} 



are the contributions to the correlation associated to 
configurations with M native residues. The transfer- 
matrix method applied in Ref. allows keeping track 
separately of the contributions coming from the config- 
urations with a given total number of native residues, 
therefore it is possible to evaluate exactly the partition 
functions Zq\ and all the averages Eq. I|27|l involved 
in the mean field equations Eq. i|26|) . The computational 
cost is relevant, though: in fact, due to the necessity of 



evaluating all C\j and some C t - JU / (the ones actually 
occurring in Eq. (J2SJ), 0(L 6 ) elementary multiplications 
are required. As far as correlations c^- are concerned, the 
same discussion of the MFA2 case holds. 



,(M) 



IV. THE ,3-HAIRPIN 

We compare the MFA results with numerical 
simulations on the /3-hairpin, the fragment 41—56 
of the naturally occurring protein GB1 (2GB 1 in 
the Protein Data Bank)i2S This peptide has been 
widely studied experimentallyj 2 ^* 3 ^ 3 ^ through all-atom 
simulationa 39 i 40 i 41 and simplified modelsi 25 ' 34 i 42 Thus it 
represents a good test for the validity of the model and 
its approximations. Since the /3-hairpin contains only 
L = 16 aminoacids, we can carry out exact enumeration 
over the 2 16 = 65536 possible configurations to compute 
explicitly the partition function 



{*♦} 

of the model. Once the function Z is known, all the ther- 
mal properties are available and it is possible to com- 
pletely characterize the thermal folding of the hairpin 
peptide. However, first, we have to adjust the model free 
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FIG. 1: Fraction of native residues Q (see 1301 during thermal 
folding, according to the GF model. Full dots are the exact 
result obtained by exhaustive enumeration. Dashes and full 
lines indicate MFA1 and MFA3 approximations, respectively. 
Inset: Fit of the hydrophobic cluster (W43 - Y45 - F52 - 
F54) population Qhyd (solid) to the experimental data from^ 
(triangles) . 



parameters e and q to reproduce experimental data on 
the hairpin equilibrium folding. Experimental results on 
tryptophan fluorescence |2i show that, in the folded state, 
the 99% of molecules contain a well formed hydrophobic 
cluster made of Trp43, Tyr45, Phe52 and Val54. In the 
model, the formation of the hydrophobic cluster is de- 
scribed by the behaviour of the four-points correlation 
function Qhyd = (S3S5S12S14} (notice that, here and in 
the following, residues are renumbered from 1 to 16, in- 
stead of 41—56). The choice of the model parameters 
q = 2.32 and e = —0.0632 (kcal/mol) provides the best 
fit of Qhyd to the behavior of the experimental fraction 
of folded molecules (cfr. inset of Fig. ^ with Fig. 3 of 
Ref. l3~Hb We can now assess the goodness of the model 
and its mean-field approximations, by comparing their 
predictions with the experimental results and simula- 
tions. 

Averages and correlations within the mean-field 
schemes will be evaluated as follows: for MFA1, the self- 
consistent mean-field equations l|13f) are solved by itera- 
tion, substituting an arbitrary initial value for m at the 
right-hand side of Eq. I|13fl . evaluating rrii from the left- 
hand side, and substituting again the latter value in the 
right-hand side, until convergence is achieved. 

In the present case, this procedure converges quickly 
to two different solutions (depending on the starting val- 
ues of the fields), corresponding to different phases: the 
folded one (m, ~ 1) at low temperature and the un- 
folded (n%i ~ 0) at high temperature. Starting from the 
unfolded phase and lowering the temperature the solu- 
tion of Eqs. (|13fl remains trapped into a set of misfolded 
metastable states. Only at temperatures well below the 
folding temperature Tp the solution collapses into the 
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T(K) 

FIG. 2: Comparison between MFA and the exact enumera- 
tion. Behaviour of specific heat (kcal mol -1 K _1 ) and free- 
energy (kcal mol - ) with temperature, as obtained by the 
exact enumeration of the GF model applied to the hairpin. 
Dots indicate the exact results, while dashed and solid lines 
correspond to MFA1 and MFA3, respectively. In the inset, the 
mean-field free energies and the exact free energy are plotted 
against temperature rendering conventions as before. Notice 
the crossing of two branches of MFA1 at the transition tem- 
perature. 

one representing the folded state. The opposite hap- 
pens when the temperature is increased starting from 
the folded phase. This is a typical scenario of first-order 
like transitions, which is reproduced by the mean field ap- 
proach. The situation is well illustrated by the behaviour 
of the mean field free-energy, which exhibits two branches 
Fi (T) and F 2 (T) as shown by the dashed lines in the inset 
of Fig. |21 The intersection of the two branches defines the 
mean-field folding temperature. At a given temperature, 
the free-energy of the protein is obtained by selecting the 
minimum of the two branches 

F(T)=win{F 1 {T),F 2 (T)}. (28) 

In this approximation other observables present a jump 
at transition: this reflects the fact that in the ther- 
modynamic limit (here corresponding to infinitely long 
proteins), only the solution with the lowest free-energy 
would be physical. To take into account finite-size effect, 
we decide to introduce an interpolating formula to deal 
with a continuous quantity: 

(O) c- pFl (Oh+c-^(Oh (2g) 

where (0)\ and (0)2 are the averages of the observable in 
the above mentioned branches. In this way we compute 
the average magnetization (i.e. the fraction of correctly 
folded residues) of the protein: 

1 L 

Q^EW, (30) 

1=1 



as well as its energy (E). In the latter case (E}\, (E)2 
are evaluated as (E) a = d((3F a )/d(3. 

Differentiating the energy with respect to the tem- 
perature, we get the specific heat, reported in Fig. |21 
Notice that this is the correct recipe to take into ac- 
count also the contributions to the specific heat com- 
ing from the change of the native fraction of molecules: 
the alternative one, obtained with the direct application 
of Eq. (El to the specific heats C\ = d{E) 1 /dT and 
C 2 V = d(E) 2 /dT, would neglect the change in the number 
of folded molecules, and account only for the variations of 
the energy within the pure native or unfolded state. For 
the same reason, Eq. <|29|) is not useful to match the cor- 
relation functions evaluated on the two branches. It 
would yield only a linear superposition of the Cij 's relative 
to native and unfolded states, while the correct functions 
should account for the contributions coming from all the 
configuration space. 

Coming to MFA2, we observe that it keeps exactly 
into account the entropic term Eq. (J2JI. Yet, solving the 
mean-field equations yields again two different solutions 
at each temperature. Thus, MFA2 presents the same 
kind of problems in characterizing the folding transition 
states as MFA1. This is why in the following we will 
present results just for MFAl and MFA3, that behave in 
a substantially different way. 

With MFA3, in fact, a unique set of fields x(T) is ob- 
served, independent of the starting values, for any tem- 
perature in the interesting range around the transition, 
and no empirical connection rule Eq. I|29|) is required. 
Moreover, at odds with MFAl and MFA2, the difference 
between F var and Fq in Eq. ijBJl happens to be negligible 
at all the relevant temperatures: Fq is a very good ap- 
proximation to F var . This suggests that the correct cor- 
relation functions, which would be very hard to evaluate, 
can be replaced by the ones involving averages with the 
trial Hamiltonian Hq: Cij ~ (siSj)o — (si}o{sj)o. Thus, 
within MFA3 it is possible to give a substantially correct 
characterization both of the native and unfolded states, 
and of the folding nucleus. 

In Fig. n we plot Q of Eq. I|3UI) as a function of the 
temperature, for the original model, for MFAl (with 
the help of Eq. (J2SJl) and MFA3. At low temperatures, 
where the protein assumes its native state, Q = 1, while 
Q ~ in coil configurations (i.e. at high tempera- 
tures). Mean field approximations appear to be slightly 
more "cooperative" than the original model, according to 
their steeper sigmoidal shape. The temperature at which 
Q = 1/2 is an estimate of the folding temperature: we 
have Tp ~ 306 — 306.5 K for the original model, and 
T F ~ 305 K for both MFAl and MFA3. 

In Fig. |21 we plot the specific heat: 

_ (E 2 ) - (E) 2 dU 
Cv ~ RT 2 ~ dT {6 ' 

and the free energy. The peak of C v , which provides 
another definition for the folding temperature, occurs 
around Tp ~ 309.5 K for the exact model and its mean 
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FIG. 3: Free energy landscape for the hairpin, i.e. plot the 
free-energy (kcal moP 1 ) of the system vs. the number of 
native residues M . Solid lines: exact results for the GF model; 
dotted lines: MFA3. Temperatures are 285 K, 300 K, 315 K, 
330 K, 345 K, 360 K, from top to bottom. 



i 1 

o 3-5 
□ 3-12 




280 300 320 340 360 

T(K) 

FIG. 4: Temperature behavior of the correlation function 
Cij(T) of native contacts involving the Tryptophan (Trp45). 
Symbols correspond to the exact solution, while solid lines 
indicate the MFA3 results for the contacts 1—3, 3—5, 3—12, 
3—13, 3—14, from bottom to top. 



field approximations. Notice that MFA1 and MFA3 sub- 
stantially recover the position of the exact peak, even if 
the transition appear a little sharper in the mean-field 
cases. 

The above estimates of the folding temperatures are 
somewhat higher than the experimental ones, Tp ~ 298 
and T F ~ 295.3 K^I. Interestingly, T F appears to 
be higher than the experimental value also for "united 
atom" simulation^ (T = 308 K in the Go-model case, 
T = 333 K with the full potential introduced in that 
paper), and for all-atoms simulations^ 

Free-energy profiles, for various temperatures, are plot- 
ted in Fig. |2| versus the number of native residues M, 
that we use as the folding reaction coordinate. Profiles 
suggest that the /3-hairpin folding is well described by a 
two state process, i.e. F{M) exhibits two minima sep- 
arated by a barrier that has to be overcome in order 
to reach the native/unfolded state. Notice, though, that 
this doesn't rule out the possibility that folding might not 
be a two-state process in this case: this could happen if 
the number of native residues M was not a good reaction 
coordinate,^ Other alternative order parameters should 
be considered, in addition to M, to completely ascertain 
the nature of the transition. 

The comparison between exact and mean-field results 
reveals that the barrier appears to be overestimated in 
the MF scheme, where it is also shifted towards higher 
values of the reaction coordinate: again, the MFA ap- 
pears to be more cooperative than the original model. 
Notice however that the free-energy and position of the 
native and unfolded minima, and hence the stability gap, 
are correctly recovered, especially at temperatures close 
to transition (i.e. the second and third plots from top 
down) . 

Another interesting characterization of the folding 
pathway comes from the temperature behavior of the 



pairwise correlation functions between residues 

c ij (T) = (s i s j )-(s i )(s j ), (32) 

that provides an insight on the probability of con- 
tact format ion during the thermal folding, as shown in 
Refs. 1441451 

In fact, each function Cij(T) develops a peak at a char- 
acteristic temperature, which can be regarded as the tem- 
perature of formation/disruption of the contact i — j. In 
Fig. 01 we plot the correlation functions between Trp45 
and residues to which it is in native interaction. The 
height of each peak indicates the relevance of the contact 
from a thermodynamical point of view4^^S^i Thus, 
each contact turns out to be characterized thermodynam- 
ically by the location (temperature) and the height of the 
corresponding peak. This provides a criterion for rank- 
ing c ontacts in order of temperature and relevance (see 
Refs. |44j45j). For example, at the folding temperature 
Tp, the contacts that mainly contribute to the folding 
transition must be searched among those with the charac- 
teristic temperature located around Tp and with highest 
peak of Cij. Correlation analysis for the hairpin is sum- 
marized in Table[IJ where we report the temperature and 
the height of correlation function peaks, between residues 
which share a native contact. Contacts are sorted in tem- 
perature and whenever a tie occurs the sorting runs over 
the heights of the peaks. In this way, we can have a pic- 
ture of how contacts are established during the thermal 
folding. Assuming that the order of contact stabilization 
upon decreasing the temperature reflects the order of for- 
mation during folding, this is also a qualitative indication 
of the folding pathway. 

We see, from the first three columns of Table HJ that 
GF model predicts that the /3-hairpin folding begins with 
the formation of contacts 6—11 and 6—9, 9—11 and 6—8, 
located in the region between the turn (8—9) and the 
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Exact 

Contact T c har Corr. Peak 



6-11 
6-9 
9-11 

6- 8 

5- 11 

6- 12 

7- 9 

6- 10 
5-12 

5-7 

10- 12 

7- 10 

8- 10 

11- 13 
4-6 

4- 12 

5- 13 
3-5 

4-13 
3-12 

12- 14 

3- 13 

4- 14 
2-4 

2- 13 

13- 15 

3- 14 
2-14 

1-3 

1- 14 

14- 16 

2- 15 
1-15 



316.5 
316.5 
316.5 
316.5 
315.5 
315.5 
315.5 
315.5 
315.0 
315.0 
315.0 
314.5 
314.5 
314.0 
314.0 
313.5 
313.5 
312.0 
310.5 
310.5 
310.5 
309.5 
309.0 
307.0 
306.5 
306.0 
304.5 
300.0 
298.5 
296.5 
290.5 
273.0 
273.0 



0.21457 
0.21346 
0.21243 
0.18262 
0.20457 
0.20214 
0.18462 
0.16488 
0.21670 
0.18461 
0.15697 
0.15048 
0.14339 
0.17713 
0.17575 
0.19244 
0.18681 
0.14805 
0.20932 
0.15090 
0.14416 
0.17364 
0.16487 
0.08024 
0.08037 
0.05241 
0.20284 
0.11045 
0.03678 
0.03570 
0.09773 
0.12746 
0.05005 



Mean Field 
Contact T c har Corr. Peak 



6-11 
9-11 

6- 9 
6-12 

4- 6 
5-11 

11- 13 
5-12 

4- 12 

5- 13 

5- 7 

7- 9 

6- 8 
10-12 

6- 10 
4-13 

7- 10 

8- 10 
3-5 

3-12 

3- 13 

12- 14 

4- 14 
3-14 

1- 15 
2-4 

2- 13 
2-14 

13- 15 
2-15 

1-3 
1-14 

14- 16 



312.0 
312.0 
312.0 
311.5 
311.5 
311.5 
311.5 
311.5 
311.5 
311.5 
311.5 
311.5 
311.5 
311.5 
311.5 
311.0 
311.0 
311.0 
310.5 
310.5 
310.5 
310.5 
310.0 
309.5 
308.5 
307.5 
307.0 
306.0 
306.0 
305.0 
304.5 
304.5 
291.0 



0.21582 
0.21553 
0.21441 
0.21535 
0.21449 
0.21449 
0.21447 
0.21414 
0.21031 
0.21007 
0.19342 
0.19198 
0.18361 
0.17117 
0.16936 
0.20685 
0.15091 
0.14385 
0.19785 
0.19526 
0.19284 
0.19111 
0.18529 
0.17464 
0.08026 
0.08349 
0.08081 
0.07482 
0.04143 
0.01189 
0.02493 
0.01934 
0.00293 



TABLE I: Ranking of native contacts according to char- 
acteristic temperature and height of the correlation peak4- 
Contacts 1-16 and 2-16 have been neglected: they yield bad 
results because they are not stable even in the experimental 
native structurei— The first three columns refer to the exact 
solutions, the others to MFA3 results. 



hydrophobic cluster. Then, upon lowering the temper- 
ature, the folding proceeds with the formation of the 
other contacts that complete /3-hairpin structure. This 
is at odds with the results of more detailed models and 
simulationa22iiMi predicting that folding starts with the 
formation of contacts between the side chains of the hy- 
drophobic cluster, and proceeds with the stabilization of 
the hydrogen bonds in the loop region (there is no agree- 
ment on the order of hydrogen-bonds formation, though). 
GF model predictions are different also from those of 



the Munoz-Eaton model^S^i where the hairpin starts 
folding from the loop region and proceeds outwards in 
a zipper fashion. Experimental results relying on point 
mutations^ witness the importance of the hydrophobic 
residues 3, 5, 12 and, to a minor extent, 14, in stabiliz- 
ing the hairpin structure. Remarkably, contacts between 
residues 6, 9, 11 appear to be partially present also in 
denaturing conditions. 38 

It is interesting to notice, however, that, according to 
Table H| contacts 3-13, 4-14, 3-12, 12-14, 4-13 of 
the hydrophobic cluster are mainly established around 
the folding temperature, which suggests that also in GF 
model the hydrophobic cluster plays a central role. This 
is a nice feature of the model because it is consistent 
with the experimental evidence (fluorence signal) for the 
formation of the tryptophan hydrophobic environment at 
the folding. 

The estimation of correlation functions provided by 
MFA3 is only in qualitative agreement with exact results 
(see Fig. 0}: contacts are formed in a narrower range of 
temperatures, and a direct comparison would be mean- 
ingless. However we can ask what kind of information 
can be extracted from the mean-field results, wondering, 
for instance, whether the ranking of contact formation 
provided by MFA3 is "statistically equivalent" to that 
given by exact solution. Thus, we apply the Spearman 
rank-order correlation testi^ 8 . This test amounts to com- 
puting Spearman correlation 



R s 



n(n 2 — 1) 



(33) 



where Xi, and yi are the integer indicating the positions 
of the i-th contact in the two ranking respectively. The 
parameter, R s is 1 when the order in the two ranks is the 
same Xi — yi, while R s = —1, when the order is reverse 
%i + y% = n. For data in Table [fl we obtain the value 
R s = 0.902, that has a probability P < 10~ 6 to take 
place if the null hypothesis of uncorrelated ranks holds. 
This indicates that the order between the contacts ob- 
tained with exact and approximate methods is extremely 
significative: the mean-field approach basically recovers 
the correct order of contact formation and relevance as 
obtained with the true original model. 

One of the most important experimental techniques 
for characterizing the folding nucleus of a protein (more 
precisely of a protein with two-state folding) consists in 
the evaluation of $— values. $— values measure the effect 
of "perturbation" introduced in a protein by site-directed 
mutagenesis^ A mutation performed on the i-ih residue 
may affect the thermodynamics and kinetics, by alter- 
ing the free-energy difference between the native and un- 
folded state (i.e. the stability gap) or the height of the 
folding/unfolding barrier. Its effect is quantified through 
the $- value, defined as 



*<(T) 



A(AF tu ) 
A(AF NU ) 



(34) 
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FIG. 5: Variation on the free-energy profile induced by the 
perturbation on all the interactions involving the sixth residue 
(Asp46) of the Hairpin. The variation (in kcal mol -1 ) is com- 
puted for both the exact solution and MFA3, at the respective 
temperatures of equal populations of the native and unfolded 
basins. Solid and dashed lines indicate wild-type and mu- 
tated profiles respectively for the exact solution; dot-dashed 
and points refer to wild-type and mutated profile respectively 
in the MFA3. 



where AF }U = F t -Fu, AF NU = F N ~F V , and A(AF tu ) 
and A(AFnu) are the variations, with respect to the wild 
type protein, introduced by the mutation in the folding 
barrier and stability gap. Experimentally, A(AF^u) is 
derived from the changes in the kinetic rates induced by 
different denaturant concentrations, while A(AFnu) is 
extracted from the changes in the equilibrium popula- 
tion. $-values are different for different mutations of a 
residue; in any case, a $- value close to one implies that 
the mutated residue has a native-like environment in the 
transition state and hence is involved in the folding nu- 
cleus. A value close to zero, instead, indicates that the 
transition state remains unaffected by the mutation, and 
hence the mutated residue is still unfolded at transition. 

In our theoretical description, a mutation at site i is 
simulated by weakening the strength of the couplings 
eAij of 1% between residue i and the others. We choose a 
small perturbation because we cannot predict what kind 
of rearrangements in the local structure, and hence in the 
contact map, a true residue-to-residue mutation would 
involve. Our choice warrants that the effect of mutation 
remains local and does not disrupt completely the state. 
In figure we show the effect of a "mutation" of the 
sixth residue (Asp46) on the free-energy profiles. 

To evaluate the <&- values, we compute the variations 
in free energy profiles induced by each mutation, for the 
exact solution and MFA3. Fjj and Fjy are evaluated as 
Fu,n = —RTln Zu,Ni where Zjj , Zjq are, respectively, 
the partition functions restricted to unfolded and native 
basins in the free-energy profile, i.e. the regions to the 
left and right of the top of the barrier in Fig. [3J Fj is the 
free-energy of the top of the barrier. Through expres- 



FIG. 6: Effects of "mutations" as measured by $— values on 
each residue. Full circles: $— values from the exact solution; 
open circles: $— values within MFA3 approach. The temper- 
atures are in both cases those whereby Fu — Fjy. Results 
depend only slightly on temperature, anyway. 



sion Eq. I|34|) we obtain the <f>— values for each residue. 
In Fig. [f)| we report the value distributions. There is 
a good overall correlation between the profiles, that in- 
crease and decrease together. This is a further confirma- 
tion that the relevant features of the model are conserved 
when applying the MF approach. Mean-field results yield 
smoother profiles, as it could be expected. The ends of 
the hairpin are characterized by low $-values, that be- 
come negative for MFA3: this would correspond to mu- 
tations that increase the stability gap but decrease the 
barrier, or vice-versa. According to these results, the 
folding nucleus would be made up by residues 6, 8, 9, 11, 
which is in contrast with the already mentioned simula- 
tions. 



V. CONCLUSIONS 

In this work we developed and discussed three differ- 
ent mean-field schemes for the Galzitskaya-Finkelstein 
model, that represent valid ways to deal with the model 
for characterizing the thermodynamical properties of a 
protein and its folding pathway as well. These ap- 
proaches offer viable alternatives both to the procedure 
proposed by Galzitskaya and FinkelsteinfSi and to MC 
simulations, that become computationally demanding for 
long polymers and usually affected from the sampling 
problems. We applied the model to the /3-hairpin frag- 
ment 41-56 of the GB 1 protein, since, for this simple 
system, mean field results can be compared with a brute 
force solution of the model, and both can be checked 
against experimental data and simulation published by 
other groups. 

Our results suggest that, as far as specific heat and sim- 
ple thermodynamic quantities are concerned, the stan- 
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dard mean- field MFA1 is enough to yield correct results, 
provided that one uses the recipe Eq. (|2*§f to connect 
the two branches of the solution. For more sophisticated 
quantities like free-energy profiles, correlations and <£>- 
values, MFA3 is to be preferred, since it correctly recov- 
ers the main features of the exact solution. The hope is 
that mean-field results are still representative of the ex- 
act ones in the case of longer and more complex proteins, 
where the latter cannot be evaluated. 

GF model itself yields results that are somewhat in 
contrast with the MC and MD simulations on more de- 
tailed models for the hairpin. This discrepancy is prob- 
ably due to the extreme simplicity of the hamiltonian 
Eq. JJ}, where no distinction is made among the different 
kinds of interactions, such as hydrogen-bonds, side chain 
hydrophobicity, and so on. Indeed, we expected that a 
model accounting just for the topology of the native state 



will not score very well when applied to the /3-hairpin, 
where detailed sequence information is relevant^ Pre- 
dictions of the model could possibly improve if these el- 
ements were taken into account. 
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